Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the amount of llvm IR instantiated #205

Merged
merged 33 commits into from
Jan 26, 2021
Merged

Conversation

Marwes
Copy link
Contributor

@Marwes Marwes commented Sep 29, 2020

This works to improve the generated code in a similar way to #204 , however it is entirely orthogonal to it but also far more invasive.

The main change here is the introduction of RawTableInner which is contains all the fields that RawTable has but without being parameterized by T. Methods have been moved to this new type in parts or their entirety and RawTable forwards to these methods.

For this small test case with 4 different maps there is a reduction of the number of llvm lines generated by -17% (18088 / 21873 =0.82695560737) .

fn main() {
    let mut map1 = hashbrown::HashMap::new();
    map1.insert(1u8, "");
    map1.reserve(1000);
    let mut map2 = hashbrown::HashMap::new();
    map2.insert(1i16, "");
    map2.reserve(1000);
    let mut map3 = hashbrown::HashMap::new();
    map3.insert(3u16, "");
    map3.reserve(1000);
    let mut map4 = hashbrown::HashMap::new();
    map4.insert(3u64, "");
    map4.reserve(1000);
    dbg!((
        map1.iter().next(),
        map2.iter().next(),
        map3.iter().next(),
        map4.iter().next()
    ));
}

The commits are almost entirely orthogonal (except the first which does the main refactoring to support the rest) and if some are not desired they can be removed. If it helps, this PR could also be split up into multiple.

For most commitst I don't expect any performance degradation (unless LLVM stops inlining some function as it is now called more), but there are a couple of commits that does slow parts down however these should only be in the cold parts of the code (for instance, panic handling).

@Marwes Marwes changed the title Smaller ir Reduce the amount of llvm IR instantiated Sep 29, 2020
Copy link
Member

@Amanieu Amanieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try running the benchmarks (cargo bench) to see if any of the changes have a performance impact?

src/raw/mod.rs Outdated Show resolved Hide resolved
src/raw/mod.rs Outdated Show resolved Hide resolved
src/raw/mod.rs Outdated
}
}

impl RawTableInner {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All methods here should be #[inline] since they are non-generic.

@Marwes
Copy link
Contributor Author

Marwes commented Oct 2, 2020

Didn't bother to inline the private functions since I think it is redundant within the same compilation unit(?) But I guess they are used from generic functions which causes them to be used from another compilation unit.

That seemed to be enough to restore performance though.

 name                         old ns/iter  new ns/iter  diff ns/iter   diff %  speedup
 clone_from_large             1,272        1,325                  53    4.17%   x 0.96
 clone_from_small             17           17                      0    0.00%   x 1.00
 clone_large                  1,371        1,362                  -9   -0.66%   x 1.01
 clone_small                  32           32                      0    0.00%   x 1.00
 insert_ahash_highbits        8,753        8,957                 204    2.33%   x 0.98
 insert_ahash_random          8,982        8,872                -110   -1.22%   x 1.01
 insert_ahash_serial          8,462        8,141                -321   -3.79%   x 1.04
 insert_erase_ahash_highbits  20,480       20,212               -268   -1.31%   x 1.01
 insert_erase_ahash_random    19,826       19,715               -111   -0.56%   x 1.01
 insert_erase_ahash_serial    19,543       19,168               -375   -1.92%   x 1.02
 insert_erase_std_highbits    38,157       38,102                -55   -0.14%   x 1.00
 insert_erase_std_random      38,685       38,385               -300   -0.78%   x 1.01
 insert_erase_std_serial      31,856       32,867              1,011    3.17%   x 0.97
 insert_std_highbits          17,661       17,811                150    0.85%   x 0.99
 insert_std_random            17,952       17,970                 18    0.10%   x 1.00
 insert_std_serial            15,414       15,504                 90    0.58%   x 0.99
 iter_ahash_highbits          1,153        1,201                  48    4.16%   x 0.96
 iter_ahash_random            1,060        1,203                 143   13.49%   x 0.88
 iter_ahash_serial            1,076        1,141                  65    6.04%   x 0.94
 iter_std_highbits            1,112        1,087                 -25   -2.25%   x 1.02
 iter_std_random              1,239        1,111                -128  -10.33%   x 1.12
 iter_std_serial              1,171        1,117                 -54   -4.61%   x 1.05
 lookup_ahash_highbits        5,998        5,831                -167   -2.78%   x 1.03
 lookup_ahash_random          5,611        5,674                  63    1.12%   x 0.99
 lookup_ahash_serial          5,405        5,393                 -12   -0.22%   x 1.00
 lookup_fail_ahash_highbits   5,407        5,167                -240   -4.44%   x 1.05
 lookup_fail_ahash_random     5,182        5,480                 298    5.75%   x 0.95
 lookup_fail_ahash_serial     5,014        5,475                 461    9.19%   x 0.92
 lookup_fail_std_highbits     14,310       14,504                194    1.36%   x 0.99
 lookup_fail_std_random       14,117       14,548                431    3.05%   x 0.97
 lookup_fail_std_serial       12,181       12,272                 91    0.75%   x 0.99
 lookup_std_highbits          14,178       14,635                457    3.22%   x 0.97
 lookup_std_random            14,449       14,862                413    2.86%   x 0.97
 lookup_std_serial            12,059       12,254                195    1.62%   x 0.98

@Amanieu
Copy link
Member

Amanieu commented Oct 4, 2020

The changes mostly look good. However they are quite invasive so I would feel more comfortable if we could test the impact of this change in compilation performance.

Could you try creating a PR in rust-lang/rust which switches libstd to use your branch of hashbrown so that we can do a perf run?

@bors
Copy link
Contributor

bors commented Oct 15, 2020

☔ The latest upstream changes (presumably #208) made this pull request unmergeable. Please resolve the merge conflicts.

Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:

@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author

@Marwes Marwes force-pushed the smaller_ir branch 2 times, most recently from 3faab35 to 502f9fe Compare January 14, 2021 22:27
@Marwes Marwes force-pushed the smaller_ir branch 2 times, most recently from adb8127 to 0ec09ce Compare January 21, 2021 16:33
This was referenced Mar 15, 2021
bors pushed a commit to rust-lang-ci/rust that referenced this pull request Mar 16, 2021
Includes rust-lang/hashbrown#204 and rust-lang/hashbrown#205 (not yet merged) which both server to reduce the amount of IR generated for hashmaps.

Inspired by the llvm-lines data gathered in rust-lang#76680
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 18, 2021
feat: Update hashbrown to instantiate less llvm IR

Includes rust-lang/hashbrown#204 and rust-lang/hashbrown#205 (not yet merged) which both serve to reduce the amount of IR generated for hashmaps.

Inspired by the llvm-lines data gathered in rust-lang#76680 (cc `@Julian-Wollersberger)`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants